modify split_qkv_rmsnorm_rope#282
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
* upstream/main: modify split_qkv_rmsnorm_rope (sgl-project#282) bump version to 2025.12.25 (sgl-project#281) l2 norm const parameter change (sgl-project#276) Fix the issue of HCCL buffer tiling verification failure during one round of testing. (sgl-project#280)
|
This PR may affect the current Qwen3 model support. sgl-project/sglang#12078. |
Yes, I have considered this. The change introduced in this PR makes the normalization component of the split_qkv_rmsnorm_rope operator optional, thereby enabling support for Llama models. A relevant PR has already been submitted to SGLang and is awaiting merge. |
|
@Liwansi Great! Could you please share the related PR link/address? |
…pu-old into bugfix * 'a3_topk-1' of https://github.com/luanyundu/sgl-kernel-npu-old: fix dispatch_layout to support topk -1 feature optimize gdn gating and fused_qkvzba_split_reshape_cat (sgl-project#306) fix layout numTokensPerExpertTensor partial Initialization bug (sgl-project#303) Supplement A2 doc, software and hardware compatibility info (sgl-project#294) Added an environment variable to control whether to enable the Combine Ant Migration feature. (sgl-project#304) Support build with cann 8.5 (sgl-project#283) LoRA: Optimization LoRA kernels and refactoring (sgl-project#284) fix a2 single combine aclnn params Resolving the UB out-of-bounds issue caused by A2 dual-machine mixed operation (sgl-project#288) fix notify magic auto-increment bug (sgl-project#291) split_qkv_rmsnorm_rope bugfix (sgl-project#290) Optimize prepare_lens by removing device transfer (sgl-project#289) Fix the performance degradation issue of the single-wheel operation in Ant Moving. (sgl-project#287) modify split_qkv_rmsnorm_rope (sgl-project#282)
make the normalization optional to support llama models.